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ABSTRACT 
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ABSTRACT 



This brief review explains some alternate 
scoring procedures to the classical method of 
summing correct responses. The novel procedures 
attempt in some way to retrieve and use even the 
information in the wrong responses. 
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In the psychometric literature there have been 
studies proposing new ways of using item responses 
other than the method of summing of correct responses. 

Research in the diagnostic value of multiple 
choice tests has usually led to discussion of differ- 
ential weighting of the incorrect or inappropriate 
options. The differential weighting assumes a priori 
the possibility of at least rank -orde ring of the in- 
correct options. Guttman & Schlesinger (1966) have 
developed method called facet design which generates 
systematic construction of distractors which differ 
in degree of attraction. Facet design solves the pro- 
blem of assigning meaningful differential weights to 
each response. Diagnostic developmer < ~ ' _nev„ b} 

using a deviate .form of Raven’s Progressive Matrices. 
The middle cell of a 3 x 3 matrix is used as the stim- 

s 

ulus thereby taking advantage of an added function in 
the diagonal. The Raven matrix uses the extreme lower 
right cell as the stem stimulus. 

Historically, option weighting goes back to 
E. K. Strong with his work on interest inventories 
(Strong, 1943) . Strong noted that there are no 
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"correct" responses. Options were weighted empirically 
which discriminated various occupational groups* Re- 
sponses to items then were used as variables in a 
discriminant function analysis which differentiated 
oc cup at ional groups * 

There have been repeated suggestions in the 
literature of getting at the process involved in a 
response rather than simply scoring answers as 'right* 
or 'wrong. ' The so-called 'wrong' answers can some- 
times convey information (frequently of a diagnostic 
nature if the test is properly designed) concerning the 
process of human thinking (Laurendeau and Pinar d , 1 9 6 2 ) . 

Glaser , Dsmar in , and Gardner (1954) have devel- 
oped a procedure called the tab item which reveals the 
strategy used * - blem c-lving ._juule shouting 

situation, A record is made of the sequence of s^eps 
taken and the number of steps needed to arrive - the 
correct answer* Coffman (1967) has called the 1 item 
a "test item with feedback." 

Nedelsky (1954) has suggested the rewarding of 
the ability to avoid gross errors. He devised a pro- 
cedure for distinguishing the D students from t':« F 
students • The F students were determin ed by the nor — 
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dinate amount of options chosen which were referred to 

as ridiculously implausible. Poor students who at least 

demonstrate the ability to avoid gross errors received 

D * s . Lord and Novick (1968) refer to these gross errors 

a s w orst distractors s 

If we wish to recognize the possibility 
of partial information or perhaps mis- 
information f then we can assign differ- 
ent scores to the various incorrect re- 
sponses. For example, one distractor 
might be designed to ferret out common 
misinformation. We might call such a 
distractor, which is literally the least 
correct response, the worst distractor^ 

A possible scoring scheme might assign 
a score of one to the correct response 
and a score of ~S to a worst distractor 
response where 0<S<1 (p. 313-314). 

Dressel & Schmid (1953) have derived a scoring 
formula based on the assurance of a given answer. 



Schuford & Masse n gill (1966) led on by the expectation 
of extracting "all of the potentially available infor- 
mation" devised a scoring system to maximize score if 
the student expresses his 'degree of belief probabili- 
ties 1 . The formal procedures used in both of these studies 
can not be applied to testing very young children because 
of the verbal content of the instructions. However, the 
rationale of both of these studies closely parallels what 
Piaget (1929) has cal led. conviction . 
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Coombs, Milholland , and Womer (1956) devised a 
novel measurement procedure. The individual selects 
and marks the distractors rather than the correct an- 
swers of each item. The rationale for this technique of 
scoring is that even though an examinee does not know 
the correct answer he may, nevertheless, know that 
one or more distractors are wrong. The phenomenon of 
knowing that certain distractors are incorrect is 
called partial knowledge. Testing for partial knowledge 
has little intuitive appeal for use with very young 
children but the procedure of forcing a .'Scanning strategy 
of all options is worth investigating. 

Davis and Fifer (1959) a priori weighted options 
and reported an increase in reliability from .68 to .76. 
The a priori weighting was devised by a panel of judges 
who qualitatively ranked the options of each item. 

Guttman ' s (1941) procedure consisting of criterion 
keying of the options probably holds the most psychometric 
promise. Criterion keying of; options rather than item 
weighting may give clues concerning the process underlying 
responses (Sigel, 1363) . 

The Guttman procedure also seems worthwhile inves- 
tigating since its main concern is with validity whereas 
studies such as Davis and Fifer (1959) 
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and Jacobs 
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and Vandeventer 


(1970) 


were 


primarily 


concerned ^ith 


augmenting relia 


bi lity 


with 


irs sma 1 1 


but concomitant 


validity effect 


only a 


seco 


ndary concern. 



The Jacobs and Vandeventer (1970) study used 



the facet design analysis of Guttman to a p ri ori weight 
options on Raven's Coloured Progressive Matrices. The 
procedure resulted in a statistical increase in relia- 
bility. The authors have little to say concerning the 
possibi lity of their technique contributing in the 
area of validity. 

Birnbaum (Lord & Novick , 1968) has developed a 

t hr e e — par ame t e r logistic latent — trait model which weights 
items by. level of difficulty. Birnbaum' s model has led 
to the development of sequential or tailored testing 
procedures of Novick (Lord & N o- v ick, 1968) . Sequential 

testing is still in the experimental stage but its 
feasibility has. been partly supported by claims of 
high reliability coefficients. The most promising out— 

come of sequential testing may prove to be the use of 

the computer both in test administration and test scoring. 

In general, the psychometric studies outlined 
here minimize their potential contribution to testing 
by overindulging in the domain of reliability. 
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This review of the psychometric literature is 
not directly relevant to the testing problems of young 
children. The attempt has been made to investigate the 
novel ways of using the individual options in a test 
item. The conventional psychometric way of using infor- 
mation in a test item is to score the item as 1 if the 
response is congruent with the keyed answer and to score 
the item as 0 otherwise. The total test score is then 
given as the sum of the correct items. There is potential 
information imparted within a wrong response. The classical 
psychometric model ignores this information. 

The quest for new ways of using all the information 
in a test item naturally has led to item weighting and 
partial scoring procedures. Many of the novel procedures 
car. not be applied directly to the testing of young chil- 
dren because of the language limitations of young children. 

Some of the more promising procedures seem to be 
facet design analysis, branching items, and computer based 

i 

testing - The hallmark of innovative procedures in item 
scoring has been the overall concern for reliability and 
general disregard for the more rigorous treatment of 
validity. Notable exceptions have been the study of gross 
error (Nedelsky, 1954) and the tab item (Glaser f ^Demarin f 
and Gardner*/ 1954). These two approaches attempted to get 
at the process underlying a test response. 
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The truly diagnostic test, should reveal 
information concerning both what the subject knows 
and does not know. Diagnostic tests have been around 

for some time. These tests generally reveal informa- 
tion by the binary situation of subject either passing 

or failing a test item. Tests are needed wherein each 

response option of each item reveals a certain amount 

o ^ diagnostic information. 
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